Rank | Count | Beginning |
---|---|---|
81920 | 4424 | W |
46407 | 1891 | Nie |
41793 | 1667 | Na |
28107 | 1439 | Jeśli |
76303 | 1307 | To |
22 | 1169 | A |
57057 | 1116 | Po |
24916 | 969 | Jak |
10837 | 961 | Czy |
93260 | 948 | Z |
22382 | 889 | I |
17320 | 817 | Dzięki |
29585 | 814 | Jest |
1651 | 710 | Ale |
14068 | 666 | Do |
95 | 630 | Aby |
8904 | 612 | Co |
30789 | 461 | Jeżeli |
13063 | 431 | Dla |
40223 | 424 | Możesz |
51572 | 422 | Od |
24758 | 411 | Ja |
26931 | 386 | Jednak |
40862 | 366 | Można |
20452 | 356 | Gdy |
73811 | 350 | Tak |
4461 | 341 | Bardzo |
39879 | 335 | Moze |
32416 | 321 | Kiedy |
13550 | 317 | Dlatego |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV